Overview

Dataset statistics

Number of variables22
Number of observations6095
Missing cells11353
Missing cells (%)8.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.7 MiB
Average record size in memory1.5 KiB

Variable types

Text17
Numeric2
Categorical3

Alerts

Year is highly overall correlated with original orderHigh correlation
original order is highly overall correlated with YearHigh correlation
Sex is highly imbalanced (80.2%)Imbalance
Fatal (Y/N) is highly imbalanced (69.8%)Imbalance
Area has 413 (6.8%) missing valuesMissing
Location has 512 (8.4%) missing valuesMissing
Activity has 536 (8.8%) missing valuesMissing
Name has 207 (3.4%) missing valuesMissing
Sex has 578 (9.5%) missing valuesMissing
Age has 2721 (44.6%) missing valuesMissing
Time has 3247 (53.3%) missing valuesMissing
Species has 2996 (49.2%) missing valuesMissing
original order is uniformly distributedUniform
Year has 124 (2.0%) zerosZeros

Reproduction

Analysis started2023-11-25 14:11:05.678150
Analysis finished2023-11-25 14:11:19.658361
Duration13.98 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

Distinct6078
Distinct (%)99.7%
Missing1
Missing (%)< 0.1%
Memory size402.5 KiB
2023-11-25T14:11:19.946729image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length18
Median length10
Mean length10.613718
Min length6

Characters and Unicode

Total characters64680
Distinct characters34
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6062 ?
Unique (%)99.5%

Sample

1st row2017.06.11
2nd row2017.06.10.b
3rd row2017.06.10.a
4th row2017.06.07.R
5th row2017.06.04
ValueCountFrequency (%)
1923.00.00.a 2
 
< 0.1%
1990.05.10 2
 
< 0.1%
2
 
< 0.1%
b 2
 
< 0.1%
2009.12.18 2
 
< 0.1%
1954.00.00 2
 
< 0.1%
2013.10.05 2
 
< 0.1%
2014.08.02 2
 
< 0.1%
1915.07.06.a.r 2
 
< 0.1%
2006.09.02 2
 
< 0.1%
Other values (6071) 6081
99.7%
2023-11-25T14:11:20.648298image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 14088
21.8%
0 12992
20.1%
1 10237
15.8%
2 5887
9.1%
9 5798
9.0%
8 2706
 
4.2%
6 2379
 
3.7%
7 2150
 
3.3%
5 2112
 
3.3%
3 2066
 
3.2%
Other values (24) 4265
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 48262
74.6%
Other Punctuation 14092
 
21.8%
Lowercase Letter 1526
 
2.4%
Uppercase Letter 757
 
1.2%
Dash Punctuation 33
 
0.1%
Space Separator 10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 638
41.8%
b 623
40.8%
c 131
 
8.6%
d 52
 
3.4%
e 29
 
1.9%
f 17
 
1.1%
g 11
 
0.7%
h 8
 
0.5%
j 5
 
0.3%
i 5
 
0.3%
Other values (5) 7
 
0.5%
Decimal Number
ValueCountFrequency (%)
0 12992
26.9%
1 10237
21.2%
2 5887
12.2%
9 5798
12.0%
8 2706
 
5.6%
6 2379
 
4.9%
7 2150
 
4.5%
5 2112
 
4.4%
3 2066
 
4.3%
4 1935
 
4.0%
Other Punctuation
ValueCountFrequency (%)
. 14088
> 99.9%
& 2
 
< 0.1%
, 1
 
< 0.1%
/ 1
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
R 519
68.6%
D 119
 
15.7%
N 119
 
15.7%
Dash Punctuation
ValueCountFrequency (%)
- 33
100.0%
Space Separator
ValueCountFrequency (%)
10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 62397
96.5%
Latin 2283
 
3.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 638
27.9%
b 623
27.3%
R 519
22.7%
c 131
 
5.7%
D 119
 
5.2%
N 119
 
5.2%
d 52
 
2.3%
e 29
 
1.3%
f 17
 
0.7%
g 11
 
0.5%
Other values (8) 25
 
1.1%
Common
ValueCountFrequency (%)
. 14088
22.6%
0 12992
20.8%
1 10237
16.4%
2 5887
9.4%
9 5798
9.3%
8 2706
 
4.3%
6 2379
 
3.8%
7 2150
 
3.4%
5 2112
 
3.4%
3 2066
 
3.3%
Other values (6) 1982
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 64680
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 14088
21.8%
0 12992
20.1%
1 10237
15.8%
2 5887
9.1%
9 5798
9.0%
8 2706
 
4.2%
6 2379
 
3.7%
7 2150
 
3.3%
5 2112
 
3.3%
3 2066
 
3.2%
Other values (24) 4265
 
6.6%

Date
Text

Distinct5197
Distinct (%)85.3%
Missing1
Missing (%)< 0.1%
Memory size405.3 KiB
2023-11-25T14:11:21.235557image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length64
Median length10
Mean length11.072202
Min length5

Characters and Unicode

Total characters67474
Distinct characters61
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4523 ?
Unique (%)74.2%

Sample

1st row2017-06-11
2nd row2017-06-10
3rd row2017-06-10
4th rowReported 07-Jun-2017
5th row2017-06-04
ValueCountFrequency (%)
reported 513
 
7.3%
before 85
 
1.2%
ca 35
 
0.5%
no 26
 
0.4%
date 26
 
0.4%
summer 17
 
0.2%
late 15
 
0.2%
13
 
0.2%
early 13
 
0.2%
1905-05-10 11
 
0.2%
Other values (5238) 6257
89.2%
2023-11-25T14:11:22.091382image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 11700
17.3%
0 11041
16.4%
1 10043
14.9%
2 6254
9.3%
9 5530
8.2%
8 2543
 
3.8%
5 2365
 
3.5%
6 2334
 
3.5%
3 2029
 
3.0%
7 1996
 
3.0%
Other values (51) 11639
17.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 46064
68.3%
Dash Punctuation 11700
 
17.3%
Lowercase Letter 6763
 
10.0%
Uppercase Letter 1697
 
2.5%
Space Separator 1114
 
1.7%
Other Punctuation 122
 
0.2%
Close Punctuation 6
 
< 0.1%
Open Punctuation 6
 
< 0.1%
Control 1
 
< 0.1%
Modifier Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1555
23.0%
r 803
11.9%
o 735
10.9%
p 687
10.2%
t 655
9.7%
d 567
 
8.4%
a 350
 
5.2%
u 328
 
4.8%
n 213
 
3.1%
l 150
 
2.2%
Other values (12) 720
10.6%
Uppercase Letter
ValueCountFrequency (%)
R 517
30.5%
J 293
17.3%
A 174
 
10.3%
M 130
 
7.7%
S 118
 
7.0%
B 95
 
5.6%
N 90
 
5.3%
D 78
 
4.6%
F 57
 
3.4%
O 53
 
3.1%
Other values (7) 92
 
5.4%
Decimal Number
ValueCountFrequency (%)
0 11041
24.0%
1 10043
21.8%
2 6254
13.6%
9 5530
12.0%
8 2543
 
5.5%
5 2365
 
5.1%
6 2334
 
5.1%
3 2029
 
4.4%
7 1996
 
4.3%
4 1929
 
4.2%
Other Punctuation
ValueCountFrequency (%)
. 79
64.8%
, 21
 
17.2%
" 9
 
7.4%
& 7
 
5.7%
? 4
 
3.3%
/ 2
 
1.6%
Dash Punctuation
ValueCountFrequency (%)
- 11700
100.0%
Space Separator
ValueCountFrequency (%)
1114
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Control
ValueCountFrequency (%)
1
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 59014
87.5%
Latin 8460
 
12.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1555
18.4%
r 803
9.5%
o 735
 
8.7%
p 687
 
8.1%
t 655
 
7.7%
d 567
 
6.7%
R 517
 
6.1%
a 350
 
4.1%
u 328
 
3.9%
J 293
 
3.5%
Other values (29) 1970
23.3%
Common
ValueCountFrequency (%)
- 11700
19.8%
0 11041
18.7%
1 10043
17.0%
2 6254
10.6%
9 5530
9.4%
8 2543
 
4.3%
5 2365
 
4.0%
6 2334
 
4.0%
3 2029
 
3.4%
7 1996
 
3.4%
Other values (12) 3179
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 67474
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 11700
17.3%
0 11041
16.4%
1 10043
14.9%
2 6254
9.3%
9 5530
8.2%
8 2543
 
3.8%
5 2365
 
3.5%
6 2334
 
3.5%
3 2029
 
3.0%
7 1996
 
3.0%
Other values (51) 11639
17.2%

Year
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct240
Distinct (%)3.9%
Missing3
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean1926.1973
Minimum0
Maximum2017
Zeros124
Zeros (%)2.0%
Negative0
Negative (%)0.0%
Memory size47.7 KiB
2023-11-25T14:11:22.411676image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1862
Q11942
median1976
Q32004
95-th percentile2015
Maximum2017
Range2017
Interquartile range (IQR)62

Descriptive statistics

Standard deviation284.36642
Coefficient of variation (CV)0.14763099
Kurtosis40.644989
Mean1926.1973
Median Absolute Deviation (MAD)29
Skewness-6.4397286
Sum11734394
Variance80864.262
MonotonicityNot monotonic
2023-11-25T14:11:22.694432image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2015 141
 
2.3%
2016 128
 
2.1%
2011 128
 
2.1%
2014 126
 
2.1%
0 124
 
2.0%
2013 122
 
2.0%
2008 122
 
2.0%
2009 120
 
2.0%
2012 117
 
1.9%
2007 112
 
1.8%
Other values (230) 4852
79.6%
ValueCountFrequency (%)
0 124
2.0%
5 1
 
< 0.1%
77 1
 
< 0.1%
500 1
 
< 0.1%
1543 1
 
< 0.1%
1554 1
 
< 0.1%
1555 1
 
< 0.1%
1580 1
 
< 0.1%
1595 1
 
< 0.1%
1617 1
 
< 0.1%
ValueCountFrequency (%)
2017 54
 
0.9%
2016 128
2.1%
2015 141
2.3%
2014 126
2.1%
2013 122
2.0%
2012 117
1.9%
2011 128
2.1%
2010 101
1.7%
2009 120
2.0%
2008 122
2.0%

Type
Categorical

Distinct6
Distinct (%)0.1%
Missing5
Missing (%)0.1%
Memory size395.1 KiB
Unprovoked
4466 
Provoked
563 
Invalid
529 
Sea Disaster
 
220
Boat
 
202

Length

Max length12
Median length10
Mean length9.3735632
Min length4

Characters and Unicode

Total characters57085
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnprovoked
2nd rowUnprovoked
3rd rowUnprovoked
4th rowUnprovoked
5th rowUnprovoked

Common Values

ValueCountFrequency (%)
Unprovoked 4466
73.3%
Provoked 563
 
9.2%
Invalid 529
 
8.7%
Sea Disaster 220
 
3.6%
Boat 202
 
3.3%
Boating 110
 
1.8%
(Missing) 5
 
0.1%

Length

2023-11-25T14:11:23.008837image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-25T14:11:23.306650image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
unprovoked 4466
70.8%
provoked 563
 
8.9%
invalid 529
 
8.4%
sea 220
 
3.5%
disaster 220
 
3.5%
boat 202
 
3.2%
boating 110
 
1.7%

Most occurring characters

ValueCountFrequency (%)
o 10370
18.2%
v 5558
9.7%
d 5558
9.7%
e 5469
9.6%
r 5249
9.2%
n 5105
8.9%
k 5029
8.8%
U 4466
7.8%
p 4466
7.8%
a 1281
 
2.2%
Other values (11) 4534
7.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 50555
88.6%
Uppercase Letter 6310
 
11.1%
Space Separator 220
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 10370
20.5%
v 5558
11.0%
d 5558
11.0%
e 5469
10.8%
r 5249
10.4%
n 5105
10.1%
k 5029
9.9%
p 4466
8.8%
a 1281
 
2.5%
i 859
 
1.7%
Other values (4) 1611
 
3.2%
Uppercase Letter
ValueCountFrequency (%)
U 4466
70.8%
P 563
 
8.9%
I 529
 
8.4%
B 312
 
4.9%
S 220
 
3.5%
D 220
 
3.5%
Space Separator
ValueCountFrequency (%)
220
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 56865
99.6%
Common 220
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 10370
18.2%
v 5558
9.8%
d 5558
9.8%
e 5469
9.6%
r 5249
9.2%
n 5105
9.0%
k 5029
8.8%
U 4466
7.9%
p 4466
7.9%
a 1281
 
2.3%
Other values (10) 4314
7.6%
Common
ValueCountFrequency (%)
220
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 57085
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 10370
18.2%
v 5558
9.7%
d 5558
9.7%
e 5469
9.6%
r 5249
9.2%
n 5105
8.9%
k 5029
8.8%
U 4466
7.8%
p 4466
7.8%
a 1281
 
2.2%
Other values (11) 4534
7.9%
Distinct204
Distinct (%)3.4%
Missing47
Missing (%)0.8%
Memory size380.0 KiB
2023-11-25T14:11:23.769833image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length37
Median length30
Mean length7.0656415
Min length3

Characters and Unicode

Total characters42733
Distinct characters53
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique81 ?
Unique (%)1.3%

Sample

1st rowAUSTRALIA
2nd rowAUSTRALIA
3rd rowUSA
4th rowUNITED KINGDOM
5th rowUSA
ValueCountFrequency (%)
usa 2160
28.9%
australia 1303
17.5%
south 594
 
8.0%
africa 572
 
7.7%
new 327
 
4.4%
guinea 148
 
2.0%
papua 133
 
1.8%
zealand 126
 
1.7%
brazil 103
 
1.4%
bahamas 101
 
1.4%
Other values (203) 1900
25.4%
2023-11-25T14:11:24.669219image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 10069
23.6%
U 4687
11.0%
S 4682
11.0%
I 3573
 
8.4%
R 2487
 
5.8%
T 2349
 
5.5%
L 2089
 
4.9%
N 1775
 
4.2%
E 1581
 
3.7%
1437
 
3.4%
Other values (43) 8004
18.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 41197
96.4%
Space Separator 1437
 
3.4%
Lowercase Letter 68
 
0.2%
Other Punctuation 24
 
0.1%
Close Punctuation 3
 
< 0.1%
Open Punctuation 3
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 10069
24.4%
U 4687
11.4%
S 4682
11.4%
I 3573
 
8.7%
R 2487
 
6.0%
T 2349
 
5.7%
L 2089
 
5.1%
N 1775
 
4.3%
E 1581
 
3.8%
O 1344
 
3.3%
Other values (16) 6561
15.9%
Lowercase Letter
ValueCountFrequency (%)
e 12
17.6%
i 10
14.7%
r 7
10.3%
s 5
7.4%
o 5
7.4%
t 5
7.4%
n 4
 
5.9%
a 4
 
5.9%
l 3
 
4.4%
j 3
 
4.4%
Other values (8) 10
14.7%
Other Punctuation
ValueCountFrequency (%)
& 9
37.5%
/ 7
29.2%
? 5
20.8%
. 2
 
8.3%
, 1
 
4.2%
Space Separator
ValueCountFrequency (%)
1437
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 41265
96.6%
Common 1468
 
3.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 10069
24.4%
U 4687
11.4%
S 4682
11.3%
I 3573
 
8.7%
R 2487
 
6.0%
T 2349
 
5.7%
L 2089
 
5.1%
N 1775
 
4.3%
E 1581
 
3.8%
O 1344
 
3.3%
Other values (34) 6629
16.1%
Common
ValueCountFrequency (%)
1437
97.9%
& 9
 
0.6%
/ 7
 
0.5%
? 5
 
0.3%
) 3
 
0.2%
( 3
 
0.2%
. 2
 
0.1%
- 1
 
0.1%
, 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 42733
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 10069
23.6%
U 4687
11.0%
S 4682
11.0%
I 3573
 
8.4%
R 2487
 
5.8%
T 2349
 
5.5%
L 2089
 
4.9%
N 1775
 
4.2%
E 1581
 
3.7%
1437
 
3.4%
Other values (43) 8004
18.7%

Area
Text

MISSING 

Distinct799
Distinct (%)14.1%
Missing413
Missing (%)6.8%
Memory size397.9 KiB
2023-11-25T14:11:25.249166image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length62
Median length49
Mean length12.106829
Min length4

Characters and Unicode

Total characters68791
Distinct characters83
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique542 ?
Unique (%)9.5%

Sample

1st rowWestern Australia
2nd rowVictoria
3rd rowFlorida
4th rowSouth Devon
5th rowFlorida
ValueCountFrequency (%)
florida 1017
 
10.3%
south 814
 
8.2%
province 652
 
6.6%
new 618
 
6.2%
wales 476
 
4.8%
western 391
 
3.9%
cape 352
 
3.5%
queensland 308
 
3.1%
hawaii 295
 
3.0%
california 292
 
2.9%
Other values (842) 4703
47.4%
2023-11-25T14:11:26.158421image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 8247
 
12.0%
e 5329
 
7.7%
r 4927
 
7.2%
i 4885
 
7.1%
o 4565
 
6.6%
4323
 
6.3%
n 4039
 
5.9%
l 4015
 
5.8%
t 3178
 
4.6%
s 2952
 
4.3%
Other values (73) 22331
32.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 53656
78.0%
Uppercase Letter 10171
 
14.8%
Space Separator 4323
 
6.3%
Dash Punctuation 310
 
0.5%
Decimal Number 133
 
0.2%
Other Punctuation 122
 
0.2%
Close Punctuation 29
 
< 0.1%
Open Punctuation 29
 
< 0.1%
Other Letter 12
 
< 0.1%
Control 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 8247
15.4%
e 5329
9.9%
r 4927
9.2%
i 4885
9.1%
o 4565
8.5%
n 4039
7.5%
l 4015
7.5%
t 3178
 
5.9%
s 2952
 
5.5%
u 2486
 
4.6%
Other values (23) 9033
16.8%
Uppercase Letter
ValueCountFrequency (%)
S 1187
11.7%
C 1125
11.1%
N 1117
11.0%
F 1032
10.1%
W 916
 
9.0%
P 894
 
8.8%
A 471
 
4.6%
I 398
 
3.9%
H 349
 
3.4%
Q 323
 
3.2%
Other values (16) 2359
23.2%
Decimal Number
ValueCountFrequency (%)
0 38
28.6%
3 17
12.8%
2 17
12.8%
8 15
 
11.3%
1 14
 
10.5%
5 12
 
9.0%
4 7
 
5.3%
6 5
 
3.8%
9 4
 
3.0%
7 4
 
3.0%
Other Punctuation
ValueCountFrequency (%)
, 51
41.8%
' 24
19.7%
. 21
17.2%
& 19
 
15.6%
" 2
 
1.6%
/ 2
 
1.6%
? 2
 
1.6%
: 1
 
0.8%
Space Separator
ValueCountFrequency (%)
4323
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 310
100.0%
Close Punctuation
ValueCountFrequency (%)
) 29
100.0%
Open Punctuation
ValueCountFrequency (%)
( 29
100.0%
Other Letter
ValueCountFrequency (%)
º 12
100.0%
Control
ValueCountFrequency (%)
Â’ 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 63839
92.8%
Common 4952
 
7.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 8247
12.9%
e 5329
 
8.3%
r 4927
 
7.7%
i 4885
 
7.7%
o 4565
 
7.2%
n 4039
 
6.3%
l 4015
 
6.3%
t 3178
 
5.0%
s 2952
 
4.6%
u 2486
 
3.9%
Other values (50) 19216
30.1%
Common
ValueCountFrequency (%)
4323
87.3%
- 310
 
6.3%
, 51
 
1.0%
0 38
 
0.8%
) 29
 
0.6%
( 29
 
0.6%
' 24
 
0.5%
. 21
 
0.4%
& 19
 
0.4%
3 17
 
0.3%
Other values (13) 91
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 68743
99.9%
None 48
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 8247
 
12.0%
e 5329
 
7.8%
r 4927
 
7.2%
i 4885
 
7.1%
o 4565
 
6.6%
4323
 
6.3%
n 4039
 
5.9%
l 4015
 
5.8%
t 3178
 
4.6%
s 2952
 
4.3%
Other values (63) 22283
32.4%
None
ValueCountFrequency (%)
º 12
25.0%
á 6
12.5%
é 6
12.5%
Â’ 6
12.5%
ó 5
10.4%
ã 5
10.4%
ô 3
 
6.2%
î 2
 
4.2%
É 2
 
4.2%
ò 1
 
2.1%

Location
Text

MISSING 

Distinct3984
Distinct (%)71.4%
Missing512
Missing (%)8.4%
Memory size458.4 KiB
2023-11-25T14:11:26.684303image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length119
Median length79
Mean length22.894859
Min length3

Characters and Unicode

Total characters127822
Distinct characters99
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3309 ?
Unique (%)59.3%

Sample

1st rowPoint Casuarina, Bunbury
2nd rowFlinders, Mornington Penisula
3rd rowPonce Inlet, Volusia County
4th rowBantham Beach
5th rowMiddle Sambo Reef off Boca Chica, Monroe County
ValueCountFrequency (%)
beach 1512
 
7.6%
county 1436
 
7.2%
island 599
 
3.0%
bay 489
 
2.5%
of 333
 
1.7%
volusia 305
 
1.5%
off 304
 
1.5%
river 256
 
1.3%
near 255
 
1.3%
new 245
 
1.2%
Other values (3894) 14134
71.1%
2023-11-25T14:11:28.101372image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
14629
 
11.4%
a 12755
 
10.0%
e 9554
 
7.5%
o 8042
 
6.3%
n 8001
 
6.3%
r 6120
 
4.8%
t 5926
 
4.6%
i 5135
 
4.0%
l 4906
 
3.8%
u 4397
 
3.4%
Other values (89) 48357
37.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 90809
71.0%
Uppercase Letter 17522
 
13.7%
Space Separator 14631
 
11.4%
Other Punctuation 3738
 
2.9%
Decimal Number 707
 
0.6%
Dash Punctuation 141
 
0.1%
Open Punctuation 93
 
0.1%
Close Punctuation 93
 
0.1%
Control 79
 
0.1%
Other Letter 8
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 12755
14.0%
e 9554
10.5%
o 8042
 
8.9%
n 8001
 
8.8%
r 6120
 
6.7%
t 5926
 
6.5%
i 5135
 
5.7%
l 4906
 
5.4%
u 4397
 
4.8%
s 4226
 
4.7%
Other values (30) 21747
23.9%
Uppercase Letter
ValueCountFrequency (%)
B 2922
16.7%
C 2478
14.1%
S 1655
 
9.4%
P 1344
 
7.7%
M 1127
 
6.4%
I 931
 
5.3%
R 736
 
4.2%
N 712
 
4.1%
H 677
 
3.9%
L 535
 
3.1%
Other values (18) 4405
25.1%
Decimal Number
ValueCountFrequency (%)
0 198
28.0%
1 114
16.1%
2 88
12.4%
5 84
11.9%
3 61
 
8.6%
4 44
 
6.2%
6 38
 
5.4%
7 33
 
4.7%
8 30
 
4.2%
9 17
 
2.4%
Other Punctuation
ValueCountFrequency (%)
, 3158
84.5%
' 292
 
7.8%
. 177
 
4.7%
& 52
 
1.4%
/ 29
 
0.8%
? 19
 
0.5%
" 10
 
0.3%
: 1
 
< 0.1%
Control
ValueCountFrequency (%)
Â’ 73
92.4%
‘ 2
 
2.5%
” 1
 
1.3%
“ 1
 
1.3%
š 1
 
1.3%
– 1
 
1.3%
Space Separator
ValueCountFrequency (%)
14629
> 99.9%
  2
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 141
100.0%
Open Punctuation
ValueCountFrequency (%)
( 93
100.0%
Close Punctuation
ValueCountFrequency (%)
) 93
100.0%
Other Letter
ValueCountFrequency (%)
º 8
100.0%
Other Number
ValueCountFrequency (%)
½ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 108339
84.8%
Common 19483
 
15.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 12755
 
11.8%
e 9554
 
8.8%
o 8042
 
7.4%
n 8001
 
7.4%
r 6120
 
5.6%
t 5926
 
5.5%
i 5135
 
4.7%
l 4906
 
4.5%
u 4397
 
4.1%
s 4226
 
3.9%
Other values (59) 39277
36.3%
Common
ValueCountFrequency (%)
14629
75.1%
, 3158
 
16.2%
' 292
 
1.5%
0 198
 
1.0%
. 177
 
0.9%
- 141
 
0.7%
1 114
 
0.6%
( 93
 
0.5%
) 93
 
0.5%
2 88
 
0.5%
Other values (20) 500
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 127672
99.9%
None 150
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14629
 
11.5%
a 12755
 
10.0%
e 9554
 
7.5%
o 8042
 
6.3%
n 8001
 
6.3%
r 6120
 
4.8%
t 5926
 
4.6%
i 5135
 
4.0%
l 4906
 
3.8%
u 4397
 
3.4%
Other values (64) 48207
37.8%
None
ValueCountFrequency (%)
Â’ 73
48.7%
é 17
 
11.3%
º 8
 
5.3%
ã 7
 
4.7%
á 7
 
4.7%
ñ 4
 
2.7%
è 4
 
2.7%
ó 4
 
2.7%
ú 3
 
2.0%
ÃŽ 3
 
2.0%
Other values (15) 20
 
13.3%

Activity
Text

MISSING 

Distinct1503
Distinct (%)27.0%
Missing536
Missing (%)8.8%
Memory size418.3 KiB
2023-11-25T14:11:29.451065image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length255
Median length242
Mean length16.647958
Min length1

Characters and Unicode

Total characters92546
Distinct characters80
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1289 ?
Unique (%)23.2%

Sample

1st rowBody boarding
2nd rowSurfing
3rd rowSurfing
4th rowSurfing
5th rowSpearfishing
ValueCountFrequency (%)
swimming 1077
 
7.5%
surfing 1060
 
7.4%
fishing 713
 
5.0%
diving 539
 
3.8%
spearfishing 423
 
3.0%
the 352
 
2.5%
267
 
1.9%
in 245
 
1.7%
a 236
 
1.7%
for 208
 
1.5%
Other values (1979) 9159
64.1%
2023-11-25T14:11:32.355843image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 10695
 
11.6%
8982
 
9.7%
n 8045
 
8.7%
g 6069
 
6.6%
r 5346
 
5.8%
a 5294
 
5.7%
e 5257
 
5.7%
s 3805
 
4.1%
o 3640
 
3.9%
t 3363
 
3.6%
Other values (70) 32050
34.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 75206
81.3%
Space Separator 8982
 
9.7%
Uppercase Letter 6549
 
7.1%
Other Punctuation 853
 
0.9%
Decimal Number 584
 
0.6%
Dash Punctuation 170
 
0.2%
Close Punctuation 93
 
0.1%
Open Punctuation 93
 
0.1%
Control 16
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 10695
14.2%
n 8045
 
10.7%
g 6069
 
8.1%
r 5346
 
7.1%
a 5294
 
7.0%
e 5257
 
7.0%
s 3805
 
5.1%
o 3640
 
4.8%
t 3363
 
4.5%
h 3252
 
4.3%
Other values (18) 20440
27.2%
Uppercase Letter
ValueCountFrequency (%)
S 3177
48.5%
F 884
 
13.5%
B 478
 
7.3%
D 330
 
5.0%
W 304
 
4.6%
A 174
 
2.7%
P 163
 
2.5%
C 154
 
2.4%
T 149
 
2.3%
H 93
 
1.4%
Other values (15) 643
 
9.8%
Decimal Number
ValueCountFrequency (%)
1 101
17.3%
2 85
14.6%
0 72
12.3%
4 65
11.1%
3 62
10.6%
5 57
9.8%
9 40
 
6.8%
7 39
 
6.7%
6 35
 
6.0%
8 28
 
4.8%
Other Punctuation
ValueCountFrequency (%)
, 338
39.6%
& 178
20.9%
. 138
16.2%
/ 100
 
11.7%
' 56
 
6.6%
" 28
 
3.3%
? 9
 
1.1%
: 4
 
0.5%
; 2
 
0.2%
Control
ValueCountFrequency (%)
Â’ 12
75.0%
“ 2
 
12.5%
– 1
 
6.2%
” 1
 
6.2%
Space Separator
ValueCountFrequency (%)
8982
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 170
100.0%
Close Punctuation
ValueCountFrequency (%)
) 93
100.0%
Open Punctuation
ValueCountFrequency (%)
( 93
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 81755
88.3%
Common 10791
 
11.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 10695
 
13.1%
n 8045
 
9.8%
g 6069
 
7.4%
r 5346
 
6.5%
a 5294
 
6.5%
e 5257
 
6.4%
s 3805
 
4.7%
o 3640
 
4.5%
t 3363
 
4.1%
h 3252
 
4.0%
Other values (43) 26989
33.0%
Common
ValueCountFrequency (%)
8982
83.2%
, 338
 
3.1%
& 178
 
1.6%
- 170
 
1.6%
. 138
 
1.3%
1 101
 
0.9%
/ 100
 
0.9%
) 93
 
0.9%
( 93
 
0.9%
2 85
 
0.8%
Other values (17) 513
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 92528
> 99.9%
None 18
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 10695
 
11.6%
8982
 
9.7%
n 8045
 
8.7%
g 6069
 
6.6%
r 5346
 
5.8%
a 5294
 
5.7%
e 5257
 
5.7%
s 3805
 
4.1%
o 3640
 
3.9%
t 3363
 
3.6%
Other values (64) 32032
34.6%
None
ValueCountFrequency (%)
Â’ 12
66.7%
“ 2
 
11.1%
– 1
 
5.6%
ê 1
 
5.6%
í 1
 
5.6%
” 1
 
5.6%

Name
Text

MISSING 

Distinct5086
Distinct (%)86.4%
Missing207
Missing (%)3.4%
Memory size424.8 KiB
2023-11-25T14:11:34.296369image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length222
Median length111
Mean length15.120924
Min length1

Characters and Unicode

Total characters89032
Distinct characters100
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4998 ?
Unique (%)84.9%

Sample

1st rowPaul Goff
2nd rowfemale
3rd rowBryan Brock
4th rowRich Thomson
5th rowParker Simpson
ValueCountFrequency (%)
male 604
 
4.1%
a 298
 
2.0%
232
 
1.6%
boat 174
 
1.2%
john 162
 
1.1%
occupants 153
 
1.0%
female 112
 
0.8%
the 95
 
0.6%
william 92
 
0.6%
james 86
 
0.6%
Other values (6013) 12618
86.3%
2023-11-25T14:11:38.290117image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9205
 
10.3%
a 8210
 
9.2%
e 7850
 
8.8%
n 5604
 
6.3%
r 5579
 
6.3%
o 5118
 
5.7%
i 4603
 
5.2%
l 4338
 
4.9%
s 3375
 
3.8%
t 3275
 
3.7%
Other values (90) 31875
35.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 65805
73.9%
Uppercase Letter 11391
 
12.8%
Space Separator 9207
 
10.3%
Other Punctuation 1906
 
2.1%
Decimal Number 441
 
0.5%
Dash Punctuation 110
 
0.1%
Open Punctuation 68
 
0.1%
Close Punctuation 68
 
0.1%
Control 27
 
< 0.1%
Connector Punctuation 7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 8210
12.5%
e 7850
11.9%
n 5604
 
8.5%
r 5579
 
8.5%
o 5118
 
7.8%
i 4603
 
7.0%
l 4338
 
6.6%
s 3375
 
5.1%
t 3275
 
5.0%
m 2360
 
3.6%
Other values (29) 15493
23.5%
Uppercase Letter
ValueCountFrequency (%)
M 1035
 
9.1%
J 928
 
8.1%
S 906
 
8.0%
C 800
 
7.0%
B 726
 
6.4%
A 698
 
6.1%
R 673
 
5.9%
D 599
 
5.3%
H 543
 
4.8%
G 523
 
4.6%
Other values (17) 3960
34.8%
Other Punctuation
ValueCountFrequency (%)
. 861
45.2%
, 480
25.2%
& 215
 
11.3%
: 187
 
9.8%
' 98
 
5.1%
" 49
 
2.6%
; 10
 
0.5%
/ 2
 
0.1%
? 2
 
0.1%
# 1
 
0.1%
Decimal Number
ValueCountFrequency (%)
2 119
27.0%
1 83
18.8%
4 53
12.0%
5 48
10.9%
3 35
 
7.9%
6 29
 
6.6%
0 25
 
5.7%
8 20
 
4.5%
7 17
 
3.9%
9 12
 
2.7%
Control
ValueCountFrequency (%)
Â’ 15
55.6%
4
 
14.8%
” 3
 
11.1%
“ 3
 
11.1%
‘ 1
 
3.7%
Â… 1
 
3.7%
Space Separator
ValueCountFrequency (%)
9205
> 99.9%
  2
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 110
100.0%
Open Punctuation
ValueCountFrequency (%)
( 68
100.0%
Close Punctuation
ValueCountFrequency (%)
) 68
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 7
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 77196
86.7%
Common 11836
 
13.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 8210
 
10.6%
e 7850
 
10.2%
n 5604
 
7.3%
r 5579
 
7.2%
o 5118
 
6.6%
i 4603
 
6.0%
l 4338
 
5.6%
s 3375
 
4.4%
t 3275
 
4.2%
m 2360
 
3.1%
Other values (56) 26884
34.8%
Common
ValueCountFrequency (%)
9205
77.8%
. 861
 
7.3%
, 480
 
4.1%
& 215
 
1.8%
: 187
 
1.6%
2 119
 
1.0%
- 110
 
0.9%
' 98
 
0.8%
1 83
 
0.7%
( 68
 
0.6%
Other values (24) 410
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 88936
99.9%
None 96
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9205
 
10.4%
a 8210
 
9.2%
e 7850
 
8.8%
n 5604
 
6.3%
r 5579
 
6.3%
o 5118
 
5.8%
i 4603
 
5.2%
l 4338
 
4.9%
s 3375
 
3.8%
t 3275
 
3.7%
Other values (70) 31779
35.7%
None
ValueCountFrequency (%)
é 32
33.3%
Â’ 15
15.6%
á 8
 
8.3%
ã 5
 
5.2%
í 5
 
5.2%
ó 4
 
4.2%
ú 4
 
4.2%
” 3
 
3.1%
“ 3
 
3.1%
  2
 
2.1%
Other values (10) 15
15.6%

Sex
Categorical

IMBALANCE  MISSING 

Distinct6
Distinct (%)0.1%
Missing578
Missing (%)9.5%
Memory size335.2 KiB
M
4906 
F
606 
M
 
2
lli
 
1
N
 
1

Length

Max length3
Median length1
Mean length1.000725
Min length1

Characters and Unicode

Total characters5521
Distinct characters7
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st rowM
2nd rowF
3rd rowM
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
M 4906
80.5%
F 606
 
9.9%
M 2
 
< 0.1%
lli 1
 
< 0.1%
N 1
 
< 0.1%
. 1
 
< 0.1%
(Missing) 578
 
9.5%

Length

2023-11-25T14:11:38.787272image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-25T14:11:39.223525image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
m 4908
89.0%
f 606
 
11.0%
lli 1
 
< 0.1%
n 1
 
< 0.1%
1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
M 4908
88.9%
F 606
 
11.0%
2
 
< 0.1%
l 2
 
< 0.1%
i 1
 
< 0.1%
N 1
 
< 0.1%
. 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5515
99.9%
Lowercase Letter 3
 
0.1%
Space Separator 2
 
< 0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 4908
89.0%
F 606
 
11.0%
N 1
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
l 2
66.7%
i 1
33.3%
Space Separator
ValueCountFrequency (%)
2
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5518
99.9%
Common 3
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 4908
88.9%
F 606
 
11.0%
l 2
 
< 0.1%
i 1
 
< 0.1%
N 1
 
< 0.1%
Common
ValueCountFrequency (%)
2
66.7%
. 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5521
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M 4908
88.9%
F 606
 
11.0%
2
 
< 0.1%
l 2
 
< 0.1%
i 1
 
< 0.1%
N 1
 
< 0.1%
. 1
 
< 0.1%

Age
Text

MISSING 

Distinct151
Distinct (%)4.5%
Missing2721
Missing (%)44.6%
Memory size279.9 KiB
2023-11-25T14:11:39.928948image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length23
Median length2
Mean length2.0815056
Min length1

Characters and Unicode

Total characters7023
Distinct characters52
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique68 ?
Unique (%)2.0%

Sample

1st row48
2nd row19
3rd row30
4th row32
5th row20
ValueCountFrequency (%)
17 154
 
4.4%
18 151
 
4.4%
19 142
 
4.1%
20 142
 
4.1%
16 138
 
4.0%
15 135
 
3.9%
21 120
 
3.5%
22 115
 
3.3%
24 104
 
3.0%
25 104
 
3.0%
Other values (100) 2160
62.3%
2023-11-25T14:11:40.609799image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 1362
19.4%
2 1342
19.1%
3 846
12.0%
4 644
9.2%
5 566
8.1%
0 407
 
5.8%
6 403
 
5.7%
7 371
 
5.3%
8 366
 
5.2%
9 341
 
4.9%
Other values (42) 375
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 6648
94.7%
Lowercase Letter 178
 
2.5%
Space Separator 115
 
1.6%
Other Punctuation 42
 
0.6%
Uppercase Letter 32
 
0.5%
Dash Punctuation 3
 
< 0.1%
Other Number 2
 
< 0.1%
Close Punctuation 1
 
< 0.1%
Open Punctuation 1
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 33
18.5%
o 26
14.6%
s 25
14.0%
n 21
11.8%
r 16
9.0%
t 15
8.4%
d 7
 
3.9%
m 6
 
3.4%
l 5
 
2.8%
u 5
 
2.8%
Other values (5) 19
10.7%
Uppercase Letter
ValueCountFrequency (%)
T 10
31.2%
E 5
15.6%
M 3
 
9.4%
F 2
 
6.2%
A 2
 
6.2%
N 2
 
6.2%
C 1
 
3.1%
K 1
 
3.1%
B 1
 
3.1%
X 1
 
3.1%
Other values (4) 4
 
12.5%
Decimal Number
ValueCountFrequency (%)
1 1362
20.5%
2 1342
20.2%
3 846
12.7%
4 644
9.7%
5 566
8.5%
0 407
 
6.1%
6 403
 
6.1%
7 371
 
5.6%
8 366
 
5.5%
9 341
 
5.1%
Other Punctuation
ValueCountFrequency (%)
& 22
52.4%
, 7
 
16.7%
? 5
 
11.9%
" 4
 
9.5%
. 3
 
7.1%
' 1
 
2.4%
Space Separator
ValueCountFrequency (%)
114
99.1%
  1
 
0.9%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Other Number
ValueCountFrequency (%)
½ 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Math Symbol
ValueCountFrequency (%)
> 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 6813
97.0%
Latin 210
 
3.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 33
15.7%
o 26
12.4%
s 25
11.9%
n 21
10.0%
r 16
 
7.6%
t 15
 
7.1%
T 10
 
4.8%
d 7
 
3.3%
m 6
 
2.9%
l 5
 
2.4%
Other values (19) 46
21.9%
Common
ValueCountFrequency (%)
1 1362
20.0%
2 1342
19.7%
3 846
12.4%
4 644
9.5%
5 566
8.3%
0 407
 
6.0%
6 403
 
5.9%
7 371
 
5.4%
8 366
 
5.4%
9 341
 
5.0%
Other values (13) 165
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7020
> 99.9%
None 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1362
19.4%
2 1342
19.1%
3 846
12.1%
4 644
9.2%
5 566
8.1%
0 407
 
5.8%
6 403
 
5.7%
7 371
 
5.3%
8 366
 
5.2%
9 341
 
4.9%
Other values (40) 372
 
5.3%
None
ValueCountFrequency (%)
½ 2
66.7%
  1
33.3%

Injury
Text

Distinct3645
Distinct (%)60.1%
Missing29
Missing (%)0.5%
Memory size531.4 KiB
2023-11-25T14:11:41.188167image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length235
Median length152
Mean length31.925816
Min length5

Characters and Unicode

Total characters193662
Distinct characters81
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3267 ?
Unique (%)53.9%

Sample

1st rowNo injury, board bitten
2nd rowNo injury, knocke off board
3rd rowLaceration to left foot
4th rowBruise to leg, cuts to hand sustained when he hit the shark
5th rowLaceration to shin
ValueCountFrequency (%)
bitten 1539
 
4.7%
to 1524
 
4.6%
fatal 1296
 
3.9%
shark 1227
 
3.7%
1024
 
3.1%
injury 915
 
2.8%
no 871
 
2.6%
leg 863
 
2.6%
right 829
 
2.5%
left 820
 
2.5%
Other values (1940) 22066
66.9%
2023-11-25T14:11:42.077595image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
28050
14.5%
e 16361
 
8.4%
t 14291
 
7.4%
a 11101
 
5.7%
r 11100
 
5.7%
o 11008
 
5.7%
i 9375
 
4.8%
n 9297
 
4.8%
s 7112
 
3.7%
h 6350
 
3.3%
Other values (71) 69617
35.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 140004
72.3%
Space Separator 28050
 
14.5%
Uppercase Letter 20532
 
10.6%
Other Punctuation 3856
 
2.0%
Decimal Number 953
 
0.5%
Dash Punctuation 127
 
0.1%
Open Punctuation 49
 
< 0.1%
Close Punctuation 49
 
< 0.1%
Control 38
 
< 0.1%
Math Symbol 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 16361
11.7%
t 14291
 
10.2%
a 11101
 
7.9%
r 11100
 
7.9%
o 11008
 
7.9%
i 9375
 
6.7%
n 9297
 
6.6%
s 7112
 
5.1%
h 6350
 
4.5%
d 5787
 
4.1%
Other values (17) 38222
27.3%
Uppercase Letter
ValueCountFrequency (%)
A 2820
13.7%
L 2512
12.2%
T 2041
9.9%
N 1963
9.6%
F 1566
 
7.6%
I 1220
 
5.9%
D 1209
 
5.9%
O 1154
 
5.6%
E 1137
 
5.5%
R 950
 
4.6%
Other values (14) 3960
19.3%
Decimal Number
ValueCountFrequency (%)
1 174
18.3%
2 172
18.0%
3 145
15.2%
0 105
11.0%
5 91
9.5%
4 86
9.0%
6 57
 
6.0%
9 47
 
4.9%
8 42
 
4.4%
7 34
 
3.6%
Other Punctuation
ValueCountFrequency (%)
, 2016
52.3%
& 994
25.8%
. 373
 
9.7%
" 170
 
4.4%
' 152
 
3.9%
; 67
 
1.7%
/ 56
 
1.5%
: 20
 
0.5%
? 8
 
0.2%
Control
ValueCountFrequency (%)
Â’ 24
63.2%
“ 7
 
18.4%
” 7
 
18.4%
Open Punctuation
ValueCountFrequency (%)
( 33
67.3%
[ 16
32.7%
Close Punctuation
ValueCountFrequency (%)
) 33
67.3%
] 16
32.7%
Math Symbol
ValueCountFrequency (%)
+ 3
75.0%
> 1
 
25.0%
Space Separator
ValueCountFrequency (%)
28050
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 127
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 160536
82.9%
Common 33126
 
17.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 16361
 
10.2%
t 14291
 
8.9%
a 11101
 
6.9%
r 11100
 
6.9%
o 11008
 
6.9%
i 9375
 
5.8%
n 9297
 
5.8%
s 7112
 
4.4%
h 6350
 
4.0%
d 5787
 
3.6%
Other values (41) 58754
36.6%
Common
ValueCountFrequency (%)
28050
84.7%
, 2016
 
6.1%
& 994
 
3.0%
. 373
 
1.1%
1 174
 
0.5%
2 172
 
0.5%
" 170
 
0.5%
' 152
 
0.5%
3 145
 
0.4%
- 127
 
0.4%
Other values (20) 753
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 193623
> 99.9%
None 39
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
28050
14.5%
e 16361
 
8.4%
t 14291
 
7.4%
a 11101
 
5.7%
r 11100
 
5.7%
o 11008
 
5.7%
i 9375
 
4.8%
n 9297
 
4.8%
s 7112
 
3.7%
h 6350
 
3.3%
Other values (67) 69578
35.9%
None
ValueCountFrequency (%)
Â’ 24
61.5%
“ 7
 
17.9%
” 7
 
17.9%
ê 1
 
2.6%

Fatal (Y/N)
Categorical

IMBALANCE 

Distinct9
Distinct (%)0.1%
Missing31
Missing (%)0.5%
Memory size345.4 KiB
N
4391 
Y
1566 
UNKNOWN
 
94
N
 
8
2017
 
1
Other values (4)
 
4

Length

Max length7
Median length1
Mean length1.0959763
Min length1

Characters and Unicode

Total characters6646
Distinct characters19
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)0.1%

Sample

1st rowN
2nd rowN
3rd rowN
4th rowN
5th rowN

Common Values

ValueCountFrequency (%)
N 4391
72.0%
Y 1566
 
25.7%
UNKNOWN 94
 
1.5%
N 8
 
0.1%
2017 1
 
< 0.1%
F 1
 
< 0.1%
N 1
 
< 0.1%
#VALUE! 1
 
< 0.1%
n 1
 
< 0.1%
(Missing) 31
 
0.5%

Length

2023-11-25T14:11:42.385746image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-25T14:11:42.669739image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
n 4401
72.6%
y 1566
 
25.8%
unknown 94
 
1.6%
2017 1
 
< 0.1%
f 1
 
< 0.1%
value 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N 4682
70.4%
Y 1566
 
23.6%
U 95
 
1.4%
K 94
 
1.4%
O 94
 
1.4%
W 94
 
1.4%
9
 
0.1%
V 1
 
< 0.1%
! 1
 
< 0.1%
E 1
 
< 0.1%
Other values (9) 9
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 6630
99.8%
Space Separator 9
 
0.1%
Decimal Number 4
 
0.1%
Other Punctuation 2
 
< 0.1%
Lowercase Letter 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 4682
70.6%
Y 1566
 
23.6%
U 95
 
1.4%
K 94
 
1.4%
O 94
 
1.4%
W 94
 
1.4%
V 1
 
< 0.1%
E 1
 
< 0.1%
L 1
 
< 0.1%
A 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 1
25.0%
7 1
25.0%
0 1
25.0%
2 1
25.0%
Other Punctuation
ValueCountFrequency (%)
! 1
50.0%
# 1
50.0%
Space Separator
ValueCountFrequency (%)
9
100.0%
Lowercase Letter
ValueCountFrequency (%)
n 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6631
99.8%
Common 15
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 4682
70.6%
Y 1566
 
23.6%
U 95
 
1.4%
K 94
 
1.4%
O 94
 
1.4%
W 94
 
1.4%
V 1
 
< 0.1%
E 1
 
< 0.1%
L 1
 
< 0.1%
A 1
 
< 0.1%
Other values (2) 2
 
< 0.1%
Common
ValueCountFrequency (%)
9
60.0%
! 1
 
6.7%
1 1
 
6.7%
# 1
 
6.7%
7 1
 
6.7%
0 1
 
6.7%
2 1
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6646
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 4682
70.4%
Y 1566
 
23.6%
U 95
 
1.4%
K 94
 
1.4%
O 94
 
1.4%
W 94
 
1.4%
9
 
0.1%
V 1
 
< 0.1%
! 1
 
< 0.1%
E 1
 
< 0.1%
Other values (9) 9
 
0.1%

Time
Text

MISSING 

Distinct360
Distinct (%)12.6%
Missing3247
Missing (%)53.3%
Memory size276.2 KiB
2023-11-25T14:11:43.154108image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length69
Median length5
Mean length5.7865169
Min length1

Characters and Unicode

Total characters16480
Distinct characters61
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique197 ?
Unique (%)6.9%

Sample

1st row08h30
2nd row15h45
3rd row10h00
4th rowShortly before 12h00
5th rowMorning
ValueCountFrequency (%)
afternoon 223
 
7.4%
11h00 131
 
4.3%
morning 127
 
4.2%
12h00 113
 
3.7%
15h00 103
 
3.4%
14h00 99
 
3.3%
16h00 98
 
3.2%
16h30 74
 
2.4%
14h30 74
 
2.4%
13h00 73
 
2.4%
Other values (310) 1912
63.2%
2023-11-25T14:11:43.984361image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 3522
21.4%
h 2402
14.6%
1 2367
14.4%
3 918
 
5.6%
n 826
 
5.0%
5 700
 
4.2%
o 620
 
3.8%
4 470
 
2.9%
r 419
 
2.5%
t 382
 
2.3%
Other values (51) 3854
23.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9336
56.7%
Lowercase Letter 6238
37.9%
Uppercase Letter 586
 
3.6%
Space Separator 198
 
1.2%
Other Punctuation 82
 
0.5%
Dash Punctuation 27
 
0.2%
Math Symbol 7
 
< 0.1%
Close Punctuation 3
 
< 0.1%
Open Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
h 2402
38.5%
n 826
 
13.2%
o 620
 
9.9%
r 419
 
6.7%
t 382
 
6.1%
e 373
 
6.0%
i 270
 
4.3%
f 252
 
4.0%
g 234
 
3.8%
a 140
 
2.2%
Other values (13) 320
 
5.1%
Uppercase Letter
ValueCountFrequency (%)
A 206
35.2%
M 158
27.0%
N 62
 
10.6%
E 55
 
9.4%
L 39
 
6.7%
D 22
 
3.8%
P 15
 
2.6%
S 12
 
2.0%
B 6
 
1.0%
J 6
 
1.0%
Other values (5) 5
 
0.9%
Decimal Number
ValueCountFrequency (%)
0 3522
37.7%
1 2367
25.4%
3 918
 
9.8%
5 700
 
7.5%
4 470
 
5.0%
2 375
 
4.0%
7 287
 
3.1%
6 287
 
3.1%
8 230
 
2.5%
9 180
 
1.9%
Other Punctuation
ValueCountFrequency (%)
. 55
67.1%
" 14
 
17.1%
/ 8
 
9.8%
& 3
 
3.7%
? 1
 
1.2%
: 1
 
1.2%
Space Separator
ValueCountFrequency (%)
197
99.5%
  1
 
0.5%
Math Symbol
ValueCountFrequency (%)
> 6
85.7%
< 1
 
14.3%
Dash Punctuation
ValueCountFrequency (%)
- 27
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 9656
58.6%
Latin 6824
41.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
h 2402
35.2%
n 826
 
12.1%
o 620
 
9.1%
r 419
 
6.1%
t 382
 
5.6%
e 373
 
5.5%
i 270
 
4.0%
f 252
 
3.7%
g 234
 
3.4%
A 206
 
3.0%
Other values (28) 840
 
12.3%
Common
ValueCountFrequency (%)
0 3522
36.5%
1 2367
24.5%
3 918
 
9.5%
5 700
 
7.2%
4 470
 
4.9%
2 375
 
3.9%
7 287
 
3.0%
6 287
 
3.0%
8 230
 
2.4%
197
 
2.0%
Other values (13) 303
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16479
> 99.9%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3522
21.4%
h 2402
14.6%
1 2367
14.4%
3 918
 
5.6%
n 826
 
5.0%
5 700
 
4.2%
o 620
 
3.8%
4 470
 
2.9%
r 419
 
2.5%
t 382
 
2.3%
Other values (50) 3853
23.4%
None
ValueCountFrequency (%)
  1
100.0%

Species
Text

MISSING 

Distinct1554
Distinct (%)50.1%
Missing2996
Missing (%)49.2%
Memory size336.2 KiB
2023-11-25T14:11:44.566209image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length196
Median length136
Mean length22.668925
Min length1

Characters and Unicode

Total characters70251
Distinct characters86
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1226 ?
Unique (%)39.6%

Sample

1st rowWhite shark, 4 m
2nd row7 gill shark
3rd row3m shark, probably a smooth hound
4th row8' shark
5th rowTiger shark
ValueCountFrequency (%)
shark 3002
21.4%
m 1440
 
10.3%
to 768
 
5.5%
white 630
 
4.5%
6 306
 
2.2%
4 299
 
2.1%
5 296
 
2.1%
3 276
 
2.0%
tiger 271
 
1.9%
a 244
 
1.7%
Other values (862) 6496
46.3%
2023-11-25T14:11:45.501805image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
11558
16.5%
r 4686
 
6.7%
a 4633
 
6.6%
h 4387
 
6.2%
s 3969
 
5.6%
e 3533
 
5.0%
k 3430
 
4.9%
t 2774
 
3.9%
o 2420
 
3.4%
i 2391
 
3.4%
Other values (76) 26470
37.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 43518
61.9%
Space Separator 11561
 
16.5%
Decimal Number 6231
 
8.9%
Other Punctuation 4616
 
6.6%
Uppercase Letter 2038
 
2.9%
Close Punctuation 1028
 
1.5%
Open Punctuation 1028
 
1.5%
Dash Punctuation 177
 
0.3%
Math Symbol 28
 
< 0.1%
Control 25
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 4686
10.8%
a 4633
10.6%
h 4387
10.1%
s 3969
9.1%
e 3533
 
8.1%
k 3430
 
7.9%
t 2774
 
6.4%
o 2420
 
5.6%
i 2391
 
5.5%
m 2292
 
5.3%
Other values (17) 9003
20.7%
Uppercase Letter
ValueCountFrequency (%)
W 494
24.2%
T 308
15.1%
B 299
14.7%
S 256
12.6%
N 80
 
3.9%
G 68
 
3.3%
R 67
 
3.3%
P 63
 
3.1%
C 58
 
2.8%
M 55
 
2.7%
Other values (15) 290
14.2%
Other Punctuation
ValueCountFrequency (%)
' 1952
42.3%
. 1327
28.7%
, 973
21.1%
" 259
 
5.6%
? 41
 
0.9%
& 37
 
0.8%
; 12
 
0.3%
/ 8
 
0.2%
: 6
 
0.1%
* 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 1308
21.0%
5 966
15.5%
2 837
13.4%
4 655
10.5%
3 616
9.9%
6 538
8.6%
0 444
 
7.1%
8 422
 
6.8%
7 276
 
4.4%
9 169
 
2.7%
Math Symbol
ValueCountFrequency (%)
> 21
75.0%
< 4
 
14.3%
+ 3
 
10.7%
Control
ValueCountFrequency (%)
” 12
48.0%
“ 10
40.0%
Â’ 3
 
12.0%
Space Separator
ValueCountFrequency (%)
11558
> 99.9%
  3
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
] 1000
97.3%
) 28
 
2.7%
Open Punctuation
ValueCountFrequency (%)
[ 1000
97.3%
( 28
 
2.7%
Dash Punctuation
ValueCountFrequency (%)
- 177
100.0%
Other Number
ValueCountFrequency (%)
½ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 45556
64.8%
Common 24695
35.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 4686
10.3%
a 4633
10.2%
h 4387
 
9.6%
s 3969
 
8.7%
e 3533
 
7.8%
k 3430
 
7.5%
t 2774
 
6.1%
o 2420
 
5.3%
i 2391
 
5.2%
m 2292
 
5.0%
Other values (42) 11041
24.2%
Common
ValueCountFrequency (%)
11558
46.8%
' 1952
 
7.9%
. 1327
 
5.4%
1 1308
 
5.3%
] 1000
 
4.0%
[ 1000
 
4.0%
, 973
 
3.9%
5 966
 
3.9%
2 837
 
3.4%
4 655
 
2.7%
Other values (24) 3119
 
12.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 70221
> 99.9%
None 30
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11558
16.5%
r 4686
 
6.7%
a 4633
 
6.6%
h 4387
 
6.2%
s 3969
 
5.7%
e 3533
 
5.0%
k 3430
 
4.9%
t 2774
 
4.0%
o 2420
 
3.4%
i 2391
 
3.4%
Other values (70) 26440
37.7%
None
ValueCountFrequency (%)
” 12
40.0%
“ 10
33.3%
  3
 
10.0%
Â’ 3
 
10.0%
½ 1
 
3.3%
ã 1
 
3.3%
Distinct4831
Distinct (%)79.5%
Missing18
Missing (%)0.3%
Memory size535.0 KiB
2023-11-25T14:11:46.055300image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length210
Median length139
Mean length32.807142
Min length3

Characters and Unicode

Total characters199369
Distinct characters92
Distinct categories13 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4497 ?
Unique (%)74.0%

Sample

1st rowWA Today, 6/11/2017
2nd rowDaytona Beach News-Journal, 6/10/2017
3rd rowC. Moore, GSAF
4th rowNine News, 6/7/2017
5th rowTribune 242, 6/2/2017
ValueCountFrequency (%)
gsaf 1119
 
3.8%
550
 
1.9%
m 523
 
1.8%
v.m 518
 
1.7%
coppleson 506
 
1.7%
r 454
 
1.5%
c 420
 
1.4%
j 413
 
1.4%
a 409
 
1.4%
the 403
 
1.4%
Other values (6667) 24333
82.1%
2023-11-25T14:11:46.893149image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
24768
 
12.4%
e 11642
 
5.8%
. 9485
 
4.8%
1 8549
 
4.3%
, 7584
 
3.8%
a 7582
 
3.8%
r 7057
 
3.5%
/ 6627
 
3.3%
n 6624
 
3.3%
o 6433
 
3.2%
Other values (82) 103018
51.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 82948
41.6%
Decimal Number 33827
17.0%
Uppercase Letter 28404
 
14.2%
Other Punctuation 26519
 
13.3%
Space Separator 24776
 
12.4%
Close Punctuation 1112
 
0.6%
Open Punctuation 1110
 
0.6%
Dash Punctuation 652
 
0.3%
Control 10
 
< 0.1%
Connector Punctuation 6
 
< 0.1%
Other values (3) 5
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 11642
14.0%
a 7582
9.1%
r 7057
 
8.5%
n 6624
 
8.0%
o 6433
 
7.8%
l 6146
 
7.4%
i 6128
 
7.4%
s 4881
 
5.9%
p 4554
 
5.5%
t 4123
 
5.0%
Other values (20) 17778
21.4%
Uppercase Letter
ValueCountFrequency (%)
S 3228
 
11.4%
C 2766
 
9.7%
A 2740
 
9.6%
M 2451
 
8.6%
G 1983
 
7.0%
F 1786
 
6.3%
T 1491
 
5.2%
D 1332
 
4.7%
B 1267
 
4.5%
N 1189
 
4.2%
Other values (16) 8171
28.8%
Other Punctuation
ValueCountFrequency (%)
. 9485
35.8%
, 7584
28.6%
/ 6627
25.0%
; 2017
 
7.6%
& 507
 
1.9%
# 190
 
0.7%
: 48
 
0.2%
' 43
 
0.2%
" 14
 
0.1%
? 2
 
< 0.1%
Other values (2) 2
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 8549
25.3%
2 4850
14.3%
9 4748
14.0%
0 3313
 
9.8%
8 2361
 
7.0%
5 2227
 
6.6%
3 2158
 
6.4%
6 2093
 
6.2%
4 1828
 
5.4%
7 1700
 
5.0%
Space Separator
ValueCountFrequency (%)
24768
> 99.9%
  8
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 1104
99.5%
[ 6
 
0.5%
Close Punctuation
ValueCountFrequency (%)
) 1101
99.0%
] 11
 
1.0%
Control
ValueCountFrequency (%)
7
70.0%
Â’ 3
30.0%
Math Symbol
ValueCountFrequency (%)
= 2
66.7%
+ 1
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 652
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 6
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 111352
55.9%
Common 88017
44.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 11642
 
10.5%
a 7582
 
6.8%
r 7057
 
6.3%
n 6624
 
5.9%
o 6433
 
5.8%
l 6146
 
5.5%
i 6128
 
5.5%
s 4881
 
4.4%
p 4554
 
4.1%
t 4123
 
3.7%
Other values (46) 46182
41.5%
Common
ValueCountFrequency (%)
24768
28.1%
. 9485
 
10.8%
1 8549
 
9.7%
, 7584
 
8.6%
/ 6627
 
7.5%
2 4850
 
5.5%
9 4748
 
5.4%
0 3313
 
3.8%
8 2361
 
2.7%
5 2227
 
2.5%
Other values (26) 13505
15.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 199342
> 99.9%
None 27
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
24768
 
12.4%
e 11642
 
5.8%
. 9485
 
4.8%
1 8549
 
4.3%
, 7584
 
3.8%
a 7582
 
3.8%
r 7057
 
3.5%
/ 6627
 
3.3%
n 6624
 
3.3%
o 6433
 
3.2%
Other values (76) 102991
51.7%
None
ValueCountFrequency (%)
é 13
48.1%
  8
29.6%
Â’ 3
 
11.1%
á 1
 
3.7%
è 1
 
3.7%
î 1
 
3.7%

pdf
Text

Distinct6083
Distinct (%)99.8%
Missing1
Missing (%)< 0.1%
Memory size479.7 KiB
2023-11-25T14:11:47.259205image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length44
Median length41
Mean length23.580899
Min length5

Characters and Unicode

Total characters143702
Distinct characters69
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6072 ?
Unique (%)99.6%

Sample

1st row2017.06.11-Goff.pdf
2nd row2017.06.10.b-Flinders.pdf
3rd row2017.06.10.a-Brock.pdf
4th row2017.06.07.R-Thomson.pdf
5th row2017.06.04-Simpson.pdf
ValueCountFrequency (%)
19
 
0.3%
pdf 5
 
0.1%
fisherman.pdf 3
 
< 0.1%
boat.pdf 3
 
< 0.1%
1898.00.00.r-syria.pdf 2
 
< 0.1%
1916.12.08.a-b-german.pdf 2
 
< 0.1%
beach.pdf 2
 
< 0.1%
bay.pdf 2
 
< 0.1%
harbor.pdf 2
 
< 0.1%
1907.10.16.r-hongkong.pdf 2
 
< 0.1%
Other values (6134) 6144
99.3%
2023-11-25T14:11:47.963935image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 20055
 
14.0%
0 12966
 
9.0%
1 10259
 
7.1%
d 7262
 
5.1%
- 6939
 
4.8%
p 6483
 
4.5%
f 6437
 
4.5%
2 5896
 
4.1%
9 5807
 
4.0%
a 5661
 
3.9%
Other values (59) 55937
38.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 59486
41.4%
Decimal Number 48295
33.6%
Other Punctuation 20085
 
14.0%
Uppercase Letter 8678
 
6.0%
Dash Punctuation 6939
 
4.8%
Connector Punctuation 121
 
0.1%
Space Separator 98
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
d 7262
12.2%
p 6483
10.9%
f 6437
10.8%
a 5661
9.5%
e 4595
 
7.7%
r 3498
 
5.9%
n 3489
 
5.9%
o 3263
 
5.5%
i 3045
 
5.1%
l 2569
 
4.3%
Other values (16) 13184
22.2%
Uppercase Letter
ValueCountFrequency (%)
R 861
 
9.9%
S 782
 
9.0%
B 749
 
8.6%
C 654
 
7.5%
M 632
 
7.3%
N 518
 
6.0%
D 467
 
5.4%
H 423
 
4.9%
P 394
 
4.5%
A 352
 
4.1%
Other values (16) 2846
32.8%
Decimal Number
ValueCountFrequency (%)
0 12966
26.8%
1 10259
21.2%
2 5896
12.2%
9 5807
12.0%
8 2708
 
5.6%
6 2382
 
4.9%
7 2156
 
4.5%
5 2118
 
4.4%
3 2067
 
4.3%
4 1936
 
4.0%
Other Punctuation
ValueCountFrequency (%)
. 20055
99.9%
' 25
 
0.1%
, 3
 
< 0.1%
& 2
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 6939
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 121
100.0%
Space Separator
ValueCountFrequency (%)
98
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 75538
52.6%
Latin 68164
47.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
d 7262
 
10.7%
p 6483
 
9.5%
f 6437
 
9.4%
a 5661
 
8.3%
e 4595
 
6.7%
r 3498
 
5.1%
n 3489
 
5.1%
o 3263
 
4.8%
i 3045
 
4.5%
l 2569
 
3.8%
Other values (42) 21862
32.1%
Common
ValueCountFrequency (%)
. 20055
26.5%
0 12966
17.2%
1 10259
13.6%
- 6939
 
9.2%
2 5896
 
7.8%
9 5807
 
7.7%
8 2708
 
3.6%
6 2382
 
3.2%
7 2156
 
2.9%
5 2118
 
2.8%
Other values (7) 4252
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 143702
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 20055
 
14.0%
0 12966
 
9.0%
1 10259
 
7.1%
d 7262
 
5.1%
- 6939
 
4.8%
p 6483
 
4.5%
f 6437
 
4.5%
2 5896
 
4.1%
9 5807
 
4.0%
a 5661
 
3.9%
Other values (59) 55937
38.9%
Distinct6082
Distinct (%)99.8%
Missing2
Missing (%)< 0.1%
Memory size800.9 KiB
2023-11-25T14:11:48.275481image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length98
Median length95
Mean length77.563433
Min length7

Characters and Unicode

Total characters472594
Distinct characters73
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6071 ?
Unique (%)99.6%

Sample

1st rowhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.11-Goff.pdf
2nd rowhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.10.b-Flinders.pdf
3rd rowhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.10.a-Brock.pdf
4th rowhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.07.R-Thomson.pdf
5th rowhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.04-Simpson.pdf
ValueCountFrequency (%)
19
 
0.3%
pdf 5
 
0.1%
fisherman.pdf 3
 
< 0.1%
boat.pdf 3
 
< 0.1%
http://sharkattackfile.net/spreadsheets/pdf_directory/1906.09.27.r.a&b-munich-swede.pdf 2
 
< 0.1%
http://sharkattackfile.net/spreadsheets/pdf_directory/1916.07.12.a-b-stillwell-fisher.pdf 2
 
< 0.1%
beach.pdf 2
 
< 0.1%
bay.pdf 2
 
< 0.1%
harbor.pdf 2
 
< 0.1%
http://sharkattackfile.net/spreadsheets/pdf_directory/1921.11.27.a-b-jack.pdf 2
 
< 0.1%
Other values (6133) 6143
99.3%
2023-11-25T14:11:49.207985image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 44512
 
9.4%
e 41142
 
8.7%
/ 30455
 
6.4%
a 30022
 
6.4%
r 27862
 
5.9%
s 26466
 
5.6%
. 26139
 
5.5%
d 25535
 
5.4%
p 24755
 
5.2%
h 19488
 
4.1%
Other values (63) 176218
37.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 339661
71.9%
Other Punctuation 62717
 
13.3%
Decimal Number 48285
 
10.2%
Uppercase Letter 8683
 
1.8%
Dash Punctuation 6938
 
1.5%
Connector Punctuation 6212
 
1.3%
Space Separator 98
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 44512
13.1%
e 41142
12.1%
a 30022
8.8%
r 27862
8.2%
s 26466
7.8%
d 25535
 
7.5%
p 24755
 
7.3%
h 19488
 
5.7%
f 18618
 
5.5%
i 15227
 
4.5%
Other values (16) 66034
19.4%
Uppercase Letter
ValueCountFrequency (%)
R 861
 
9.9%
S 782
 
9.0%
B 748
 
8.6%
C 654
 
7.5%
M 632
 
7.3%
N 519
 
6.0%
D 467
 
5.4%
H 423
 
4.9%
P 394
 
4.5%
A 353
 
4.1%
Other values (16) 2850
32.8%
Decimal Number
ValueCountFrequency (%)
0 12962
26.8%
1 10255
21.2%
2 5896
12.2%
9 5805
12.0%
8 2708
 
5.6%
6 2383
 
4.9%
7 2155
 
4.5%
5 2117
 
4.4%
3 2068
 
4.3%
4 1936
 
4.0%
Other Punctuation
ValueCountFrequency (%)
/ 30455
48.6%
. 26139
41.7%
: 6091
 
9.7%
' 25
 
< 0.1%
, 3
 
< 0.1%
& 2
 
< 0.1%
# 1
 
< 0.1%
! 1
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 6938
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 6212
100.0%
Space Separator
ValueCountFrequency (%)
98
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 348344
73.7%
Common 124250
 
26.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 44512
12.8%
e 41142
11.8%
a 30022
8.6%
r 27862
 
8.0%
s 26466
 
7.6%
d 25535
 
7.3%
p 24755
 
7.1%
h 19488
 
5.6%
f 18618
 
5.3%
i 15227
 
4.4%
Other values (42) 74717
21.4%
Common
ValueCountFrequency (%)
/ 30455
24.5%
. 26139
21.0%
0 12962
10.4%
1 10255
 
8.3%
- 6938
 
5.6%
_ 6212
 
5.0%
: 6091
 
4.9%
2 5896
 
4.7%
9 5805
 
4.7%
8 2708
 
2.2%
Other values (11) 10789
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 472594
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 44512
 
9.4%
e 41142
 
8.7%
/ 30455
 
6.4%
a 30022
 
6.4%
r 27862
 
5.9%
s 26466
 
5.6%
. 26139
 
5.5%
d 25535
 
5.4%
p 24755
 
5.2%
h 19488
 
4.1%
Other values (63) 176218
37.3%

href
Text

Distinct6076
Distinct (%)99.7%
Missing2
Missing (%)< 0.1%
Memory size801.9 KiB
2023-11-25T14:11:49.732991image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length135
Median length131
Mean length77.731331
Min length34

Characters and Unicode

Total characters473617
Distinct characters71
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6062 ?
Unique (%)99.5%

Sample

1st rowhttp://sharkattackfile.net/spreadsheets/pdf_directory/http://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.11-Goff.pdf
2nd rowhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.10.b-Flinders.pdf
3rd rowhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.10.a-Brock.pdf
4th rowhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.07.R-Thomson.pdf
5th rowhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.04-Simpson.pdf
ValueCountFrequency (%)
21
 
0.3%
pdf 4
 
0.1%
http://sharkattackfile.net/spreadsheets/pdf_directory/w014.01.25-grant.pdf 4
 
0.1%
boat.pdf 3
 
< 0.1%
http://sharkattackfile.net/spreadsheets/pdf_directory/2014.10.02.b-vandenberg.pdf 3
 
< 0.1%
fisherman.pdf 3
 
< 0.1%
http://sharkattackfile.net/spreadsheets/pdf_directory/1934.12.23.a-b-inman.pdf 2
 
< 0.1%
crew.pdf 2
 
< 0.1%
http://sharkattackfile.net/spreadsheets/pdf_directory/1916.07.12.a-b-stillwell-fisher.pdf 2
 
< 0.1%
bay.pdf 2
 
< 0.1%
Other values (6129) 6142
99.3%
2023-11-25T14:11:50.720871image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 44649
 
9.4%
e 41258
 
8.7%
/ 30549
 
6.5%
a 30097
 
6.4%
r 27943
 
5.9%
s 26538
 
5.6%
. 26147
 
5.5%
d 25590
 
5.4%
p 24805
 
5.2%
h 19545
 
4.1%
Other values (61) 176496
37.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 340541
71.9%
Other Punctuation 62834
 
13.3%
Decimal Number 48291
 
10.2%
Uppercase Letter 8677
 
1.8%
Dash Punctuation 6940
 
1.5%
Connector Punctuation 6232
 
1.3%
Space Separator 102
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 44649
13.1%
e 41258
12.1%
a 30097
8.8%
r 27943
8.2%
s 26538
7.8%
d 25590
 
7.5%
p 24805
 
7.3%
h 19545
 
5.7%
f 18649
 
5.5%
i 15267
 
4.5%
Other values (16) 66200
19.4%
Uppercase Letter
ValueCountFrequency (%)
R 859
 
9.9%
S 782
 
9.0%
B 750
 
8.6%
C 653
 
7.5%
M 631
 
7.3%
N 519
 
6.0%
D 467
 
5.4%
H 423
 
4.9%
P 393
 
4.5%
A 352
 
4.1%
Other values (16) 2848
32.8%
Decimal Number
ValueCountFrequency (%)
0 12967
26.9%
1 10258
21.2%
2 5894
12.2%
9 5808
12.0%
8 2707
 
5.6%
6 2380
 
4.9%
7 2154
 
4.5%
5 2121
 
4.4%
3 2066
 
4.3%
4 1936
 
4.0%
Other Punctuation
ValueCountFrequency (%)
/ 30549
48.6%
. 26147
41.6%
: 6110
 
9.7%
' 24
 
< 0.1%
& 2
 
< 0.1%
, 2
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 6940
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 6232
100.0%
Space Separator
ValueCountFrequency (%)
102
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 349218
73.7%
Common 124399
 
26.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 44649
12.8%
e 41258
11.8%
a 30097
8.6%
r 27943
 
8.0%
s 26538
 
7.6%
d 25590
 
7.3%
p 24805
 
7.1%
h 19545
 
5.6%
f 18649
 
5.3%
i 15267
 
4.4%
Other values (42) 74877
21.4%
Common
ValueCountFrequency (%)
/ 30549
24.6%
. 26147
21.0%
0 12967
10.4%
1 10258
 
8.2%
- 6940
 
5.6%
_ 6232
 
5.0%
: 6110
 
4.9%
2 5894
 
4.7%
9 5808
 
4.7%
8 2707
 
2.2%
Other values (9) 10787
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 473617
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 44649
 
9.4%
e 41258
 
8.7%
/ 30549
 
6.5%
a 30097
 
6.4%
r 27943
 
5.9%
s 26538
 
5.6%
. 26147
 
5.5%
d 25590
 
5.4%
p 24805
 
5.2%
h 19545
 
4.1%
Other values (61) 176496
37.3%
Distinct6077
Distinct (%)99.7%
Missing1
Missing (%)< 0.1%
Memory size402.5 KiB
2023-11-25T14:11:51.423774image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length18
Median length10
Mean length10.61339
Min length6

Characters and Unicode

Total characters64678
Distinct characters35
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6060 ?
Unique (%)99.4%

Sample

1st row2017.06.11
2nd row2017.06.10.b
3rd row2017.06.10.a
4th row2017.06.07.R
5th row2017.06.04
ValueCountFrequency (%)
1923.00.00.a 2
 
< 0.1%
1913.08.27.r 2
 
< 0.1%
b 2
 
< 0.1%
2013.10.05 2
 
< 0.1%
1954.00.00 2
 
< 0.1%
g 2
 
< 0.1%
1966.12.26 2
 
< 0.1%
2012.09.02.b 2
 
< 0.1%
1962.06.11.b 2
 
< 0.1%
1952.08.04 2
 
< 0.1%
Other values (6070) 6081
99.7%
2023-11-25T14:11:52.529942image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 14087
21.8%
0 12991
20.1%
1 10238
15.8%
2 5885
9.1%
9 5800
9.0%
8 2705
 
4.2%
6 2381
 
3.7%
7 2151
 
3.3%
5 2110
 
3.3%
3 2066
 
3.2%
Other values (25) 4264
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 48262
74.6%
Other Punctuation 14092
 
21.8%
Lowercase Letter 1525
 
2.4%
Uppercase Letter 757
 
1.2%
Dash Punctuation 32
 
< 0.1%
Space Separator 10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 637
41.8%
b 623
40.9%
c 131
 
8.6%
d 52
 
3.4%
e 29
 
1.9%
f 17
 
1.1%
g 11
 
0.7%
h 8
 
0.5%
j 5
 
0.3%
i 5
 
0.3%
Other values (5) 7
 
0.5%
Decimal Number
ValueCountFrequency (%)
0 12991
26.9%
1 10238
21.2%
2 5885
12.2%
9 5800
12.0%
8 2705
 
5.6%
6 2381
 
4.9%
7 2151
 
4.5%
5 2110
 
4.4%
3 2066
 
4.3%
4 1935
 
4.0%
Other Punctuation
ValueCountFrequency (%)
. 14087
> 99.9%
& 2
 
< 0.1%
/ 2
 
< 0.1%
, 1
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
R 518
68.4%
D 119
 
15.7%
N 119
 
15.7%
T 1
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 32
100.0%
Space Separator
ValueCountFrequency (%)
10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 62396
96.5%
Latin 2282
 
3.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 637
27.9%
b 623
27.3%
R 518
22.7%
c 131
 
5.7%
D 119
 
5.2%
N 119
 
5.2%
d 52
 
2.3%
e 29
 
1.3%
f 17
 
0.7%
g 11
 
0.5%
Other values (9) 26
 
1.1%
Common
ValueCountFrequency (%)
. 14087
22.6%
0 12991
20.8%
1 10238
16.4%
2 5885
9.4%
9 5800
9.3%
8 2705
 
4.3%
6 2381
 
3.8%
7 2151
 
3.4%
5 2110
 
3.4%
3 2066
 
3.3%
Other values (6) 1982
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 64678
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 14087
21.8%
0 12991
20.1%
1 10238
15.8%
2 5885
9.1%
9 5800
9.0%
8 2705
 
4.2%
6 2381
 
3.7%
7 2151
 
3.3%
5 2110
 
3.3%
3 2066
 
3.2%
Other values (25) 4264
 
6.6%
Distinct6078
Distinct (%)99.7%
Missing1
Missing (%)< 0.1%
Memory size402.5 KiB
2023-11-25T14:11:53.177623image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length18
Median length10
Mean length10.613718
Min length6

Characters and Unicode

Total characters64680
Distinct characters34
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6062 ?
Unique (%)99.5%

Sample

1st row2017.06.11
2nd row2017.06.10.b
3rd row2017.06.10.a
4th row2017.06.07.R
5th row2017.06.04
ValueCountFrequency (%)
1923.00.00.a 2
 
< 0.1%
1990.05.10 2
 
< 0.1%
2
 
< 0.1%
b 2
 
< 0.1%
2009.12.18 2
 
< 0.1%
1954.00.00 2
 
< 0.1%
2013.10.05 2
 
< 0.1%
2014.08.02 2
 
< 0.1%
1915.07.06.a.r 2
 
< 0.1%
2006.09.02 2
 
< 0.1%
Other values (6071) 6081
99.7%
2023-11-25T14:11:54.276597image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 14089
21.8%
0 12992
20.1%
1 10236
15.8%
2 5888
9.1%
9 5798
9.0%
8 2706
 
4.2%
6 2379
 
3.7%
7 2150
 
3.3%
5 2112
 
3.3%
3 2066
 
3.2%
Other values (24) 4264
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 48262
74.6%
Other Punctuation 14093
 
21.8%
Lowercase Letter 1526
 
2.4%
Uppercase Letter 757
 
1.2%
Dash Punctuation 32
 
< 0.1%
Space Separator 10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 638
41.8%
b 623
40.8%
c 131
 
8.6%
d 52
 
3.4%
e 29
 
1.9%
f 17
 
1.1%
g 11
 
0.7%
h 8
 
0.5%
j 5
 
0.3%
i 5
 
0.3%
Other values (5) 7
 
0.5%
Decimal Number
ValueCountFrequency (%)
0 12992
26.9%
1 10236
21.2%
2 5888
12.2%
9 5798
12.0%
8 2706
 
5.6%
6 2379
 
4.9%
7 2150
 
4.5%
5 2112
 
4.4%
3 2066
 
4.3%
4 1935
 
4.0%
Other Punctuation
ValueCountFrequency (%)
. 14089
> 99.9%
& 2
 
< 0.1%
, 1
 
< 0.1%
/ 1
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
R 519
68.6%
D 119
 
15.7%
N 119
 
15.7%
Dash Punctuation
ValueCountFrequency (%)
- 32
100.0%
Space Separator
ValueCountFrequency (%)
10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 62397
96.5%
Latin 2283
 
3.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 638
27.9%
b 623
27.3%
R 519
22.7%
c 131
 
5.7%
D 119
 
5.2%
N 119
 
5.2%
d 52
 
2.3%
e 29
 
1.3%
f 17
 
0.7%
g 11
 
0.5%
Other values (8) 25
 
1.1%
Common
ValueCountFrequency (%)
. 14089
22.6%
0 12992
20.8%
1 10236
16.4%
2 5888
9.4%
9 5798
9.3%
8 2706
 
4.3%
6 2379
 
3.8%
7 2150
 
3.4%
5 2112
 
3.4%
3 2066
 
3.3%
Other values (6) 1981
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 64680
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 14089
21.8%
0 12992
20.1%
1 10236
15.8%
2 5888
9.1%
9 5798
9.0%
8 2706
 
4.2%
6 2379
 
3.7%
7 2150
 
3.3%
5 2112
 
3.3%
3 2066
 
3.2%
Other values (24) 4264
 
6.6%

original order
Real number (ℝ)

HIGH CORRELATION  UNIFORM 

Distinct6093
Distinct (%)> 99.9%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean3048.4997
Minimum2
Maximum6095
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size47.7 KiB
2023-11-25T14:11:54.785703image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile306.65
Q11525.25
median3048.5
Q34571.75
95-th percentile5790.35
Maximum6095
Range6093
Interquartile range (IQR)3046.5

Descriptive statistics

Standard deviation1759.3311
Coefficient of variation (CV)0.57711375
Kurtosis-1.1999998
Mean3048.4997
Median Absolute Deviation (MAD)1523.5
Skewness-5.5140017 × 10-7
Sum18577557
Variance3095245.8
MonotonicityDecreasing
2023-11-25T14:11:55.529043image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
569 2
 
< 0.1%
6095 1
 
< 0.1%
2036 1
 
< 0.1%
2027 1
 
< 0.1%
2028 1
 
< 0.1%
2029 1
 
< 0.1%
2030 1
 
< 0.1%
2031 1
 
< 0.1%
2032 1
 
< 0.1%
2033 1
 
< 0.1%
Other values (6083) 6083
99.8%
ValueCountFrequency (%)
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
11 1
< 0.1%
ValueCountFrequency (%)
6095 1
< 0.1%
6094 1
< 0.1%
6093 1
< 0.1%
6092 1
< 0.1%
6091 1
< 0.1%
6090 1
< 0.1%
6089 1
< 0.1%
6088 1
< 0.1%
6087 1
< 0.1%
6086 1
< 0.1%

Interactions

2023-11-25T14:11:15.946942image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-25T14:11:15.146632image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-25T14:11:16.344961image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-11-25T14:11:15.540924image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-11-25T14:11:55.905043image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Fatal (Y/N)SexTypeYearoriginal order
Fatal (Y/N)1.0000.0000.146-0.337-0.337
Sex0.0001.0000.083-0.149-0.149
Type0.1460.0831.0000.0830.082
Year-0.337-0.1490.0831.0001.000
original order-0.337-0.1490.0821.0001.000

Missing values

2023-11-25T14:11:16.973556image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-25T14:11:17.952462image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-11-25T14:11:19.153502image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Case NumberDateYearTypeCountryAreaLocationActivityNameSexAgeInjuryFatal (Y/N)TimeSpeciesInvestigator or Sourcepdfhref formulahrefCase Number.1Case Number.2original order
02017.06.112017-06-112017.0UnprovokedAUSTRALIAWestern AustraliaPoint Casuarina, BunburyBody boardingPaul GoffM48No injury, board bittenN08h30White shark, 4 mWA Today, 6/11/20172017.06.11-Goff.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.11-Goff.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/http://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.11-Goff.pdf2017.06.112017.06.116095.0
12017.06.10.b2017-06-102017.0UnprovokedAUSTRALIAVictoriaFlinders, Mornington PenisulaSurfingfemaleFNaNNo injury, knocke off boardN15h457 gill sharkNaN2017.06.10.b-Flinders.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.10.b-Flinders.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.10.b-Flinders.pdf2017.06.10.b2017.06.10.b6094.0
22017.06.10.a2017-06-102017.0UnprovokedUSAFloridaPonce Inlet, Volusia CountySurfingBryan BrockM19Laceration to left footN10h00NaNDaytona Beach News-Journal, 6/10/20172017.06.10.a-Brock.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.10.a-Brock.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.10.a-Brock.pdf2017.06.10.a2017.06.10.a6093.0
32017.06.07.RReported 07-Jun-20172017.0UnprovokedUNITED KINGDOMSouth DevonBantham BeachSurfingRich ThomsonM30Bruise to leg, cuts to hand sustained when he hit the sharkNNaN3m shark, probably a smooth houndC. Moore, GSAF2017.06.07.R-Thomson.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.07.R-Thomson.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.07.R-Thomson.pdf2017.06.07.R2017.06.07.R6092.0
42017.06.042017-06-042017.0UnprovokedUSAFloridaMiddle Sambo Reef off Boca Chica, Monroe CountySpearfishingParker SimpsonMNaNLaceration to shinNNaN8' sharkNine News, 6/7/20172017.06.04-Simpson.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.04-Simpson.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.04-Simpson.pdf2017.06.042017.06.046091.0
52017.06.022017-06-022017.0UnprovokedBAHAMASNew ProvidenceAthol IslandSnorkelingTiffany JohnsonF32Right forearm severedNShortly before 12h00Tiger sharkTribune 242, 6/2/20172017.06.02-Johnson.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.02-Johnson.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.02-Johnson.pdf2017.06.022017.06.026090.0
62017.05.302017-05-302017.0ProvokedUSASouth CarolinaAwendaw, Charleston CountyTouching a sharkMackenzie HigginsF20Right hand bitten by hooked shark PROVOKED INCIDENTNNaN3' sharkC. Creswell, GSAF2017.05.30-Higgins.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.30-Higgins.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.30-Higgins.pdf2017.05.302017.05.306089.0
72017.05.282017-05-282017.0UnprovokedUSAFloridaOff JupiterFeeding sharksRandy JordanMNaNLacerations to right armNMorningTiger sharkM. Michaelson, GSAF2017.05.28-Jordan.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.28-Jordan.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.28-Jordan.pdf2017.05.282017.05.286088.0
82017.05.272017-05-272017.0NaNAUSTRALIANew South WalesEvans HeadFishingTerry SelwoodM73Abrasion to right forearm from pectoral fin of a shark that leapt into his boatNNaNNaNB. Myatt, GSAF2017.05.27-Selwood.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.27-Selwood.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/http://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.27-Selwood.pdf2017.05.272017.05.276087.0
92017.05.122017-05-122017.0UnprovokedUNITED ARAB EMIRATESSharjah,Khor FakkanSpearfishingAl BeloushiM41Right leg severely bittenNMorningNaNGulf News, 5/13/20172017.05.12-Beloushi.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.12-Beloushi.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.12-Beloushi.pdf2017.05.122017.05.126086.0
Case NumberDateYearTypeCountryAreaLocationActivityNameSexAgeInjuryFatal (Y/N)TimeSpeciesInvestigator or Sourcepdfhref formulahrefCase Number.1Case Number.2original order
6085ND.0009Before 19060.0UnprovokedAUSTRALIANaNNaNFishingboyMNaNFATAL, knocked overboard by tail of shark & carried off by sharkYNaNBlue pointerNY Sun, 9/9/1906, referring to account by Louis BeckeND-0009-boy-Australia.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0009-boy-Australia.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0009-boy-Australia.pdfND.0009ND.000910.0
6086ND.0008Before 19060.0UnprovokedAUSTRALIANaNNaNFishingfishermanMNaNFATALYNaNBlue pointerNY Sun, 9/9/1906, referring to account by Louis BeckeND-0008-Fisherman2-Australia.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0008-Fisherman2-Australia.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0008-Fisherman2-Australia.pdfND.0008ND.00089.0
6087ND.0007Before 19060.0UnprovokedAUSTRALIANaNNaNFishingfishermanMNaNFATALYNaNBlue pointersNY Sun, 9/9/1906, referring to account by Louis BeckeND-0007 - Fisherman-Australia.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0007 - Fisherman-Australia.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0007 - Fisherman-Australia.pdfND.0007ND.00078.0
6088ND.0006Before 19060.0UnprovokedAUSTRALIANew South WalesSwimmingArab boyMNaNFATALYNaNSaid to involve a grey nurse shark that leapt out of the water and seized the boy but species identification is questionableL. Becke in New York Sun, 9/9/1906; L. Schultz & M. Malin, p.523ND-0006-ArabBoy-Prymount.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0006-ArabBoy-Prymount.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0006-ArabBoy-Prymount.pdfND.0006ND.00067.0
6089ND.0005Before 19030.0UnprovokedAUSTRALIAWestern AustraliaRoebuck BayDivingmaleMNaNFATALYNaNNaNH. Taunton; N. Bartlett, p. 234ND-0005-RoebuckBay.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0005-RoebuckBay.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0005-RoebuckBay.pdfND.0005ND.00056.0
6090ND.0004Before 19030.0UnprovokedAUSTRALIAWestern AustraliaNaNPearl divingAhmunMNaNFATALYNaNNaNH. Taunton; N. Bartlett, pp. 233-234ND-0004-Ahmun.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0004-Ahmun.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0004-Ahmun.pdfND.0004ND.00045.0
6091ND.00031900-19050.0UnprovokedUSANorth CarolinaOcracoke InletSwimmingCoast Guard personnelMNaNFATALYNaNNaNF. Schwartz, p.23; C. Creswell, GSAFND-0003-Ocracoke_1900-1905.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0003-Ocracoke_1900-1905.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0003-Ocracoke_1900-1905.pdfND.0003ND.00034.0
6092ND.00021883-18890.0UnprovokedPANAMANaNPanama Bay 8ºN, 79ºWNaNJules PattersonMNaNFATALYNaNNaNThe Sun, 10/20/1938ND-0002-JulesPatterson.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0002-JulesPatterson.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directory/ND-0002-JulesPatterson.pdfND.0002ND.00023.0
6093ND.00011845-18530.0UnprovokedCEYLON (SRI LANKA)Eastern ProvinceBelow the English fort, TrincomaleeSwimmingmaleM15FATAL. "Shark bit him in half, carrying away the lower extremities"YNaNNaNS.W. BakerND-0001-Ceylon.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directoryND-0001-Ceylon.pdfhttp://sharkattackfile.net/spreadsheets/pdf_directoryND-0001-Ceylon.pdfND.0001ND.00012.0
6094NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN